Sort-merge Join
   HOME

TheInfoList



OR:

The sort-merge join (also known as merge join) is a join algorithm and is used in the implementation of a relational
database management system In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and an ...
. The basic problem of a join algorithm is to find, for each distinct value of the join attribute, the set of
tuple In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
s in each relation which display that value. The key idea of the sort-merge algorithm is to first sort the relations by the join attribute, so that interleaved linear scans will encounter these sets at the same time. In practice, the most expensive part of performing a sort-merge join is arranging for both inputs to the algorithm to be presented in sorted order. This can be achieved via an explicit sort operation (often an external sort), or by taking advantage of a pre-existing ordering in one or both of the join relations. The latter condition, called interesting order, can occur because an input to the join might be produced by an index scan of a tree-based index, another merge join, or some other plan operator that happens to produce output sorted on an appropriate key. Interesting orders need not be serendipitous: the optimizer may seek out this possibility and choose a plan that is suboptimal for a specific preceding operation if it yields an interesting order that one or more downstream nodes can exploit.


Complexity

Let R and S be relations where , R, <, S, . R fits in P_ pages memory and S fits in P_ pages memory. In the worst case, a sort-merge join will run in O(P_+P_) I/O operations. In the case that R and S are not ordered the worst case time cost will contain additional terms of sorting time: O(P_+P_+P_\log(P_)+ P_\log(P_)), which equals O(P_\log(P_)+ P_\log(P_)) (as
linearithmic In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations p ...
terms outweigh the linear terms, see Big O notation – Orders of common functions).


Pseudocode

For simplicity, the algorithm is described in the case of an
inner join A join clause in the Structured Query Language ( SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same ro ...
of two relations ''left'' and ''right''. Generalization to other join types is straightforward. The output of the algorithm will contain only rows contained in the ''left'' and ''right'' relation and duplicates form a
Cartesian product In mathematics, specifically set theory, the Cartesian product of two sets and , denoted , is the set of all ordered pairs where is an element of and is an element of . In terms of set-builder notation, that is A\times B = \. A table c ...
. function Sort-Merge Join(left: Relation, right: Relation, comparator: Comparator) Since the comparison logic is not the central aspect of this algorithm, it is hidden behind a generic comparator and can also consist of several comparison criteria (e.g. multiple columns). The compare function should return if a row is ''less(-1)'', ''equal(0)'' or ''bigger(1)'' than another row: function compare(leftRow: RelationRow, rightRow: RelationRow): number Note that a relation in terms of this pseudocode supports some basic operations: interface Relation


Simple C# implementation

Note that this implementation assumes the join attributes are unique, i.e., there is no need to output multiple tuples for a given value of the key. public class MergeJoin public class Relation


See also

* Hash join * Nested loop join


References


External links

C# Implementations of Various Join Algorithms
{{DEFAULTSORT:Sort-Merge Join Join algorithms Articles with example pseudocode Articles with example C Sharp code